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Digital echo cancellation techniques make it possible to realize efficient 
full-duplex data transmission over a single loop. The purpose of this paper is 
to elucidate the solution to the start-up problem in these devices and to present 
a new, fast, and simple tap-adjustment procedure. The theory indicates that a 
modified stochastic gradient tap-adjustment algorithm, using pseudorandom 
input data sequences for the initial training period, converges in N steps, 
where N is the total number of canceler taps, and that this is the fastest 
possible convergence time. 

I. INTRODUCTION 

Two-way voice communication over a single loop is made possible 
by the use of a hybrid bridge. However, the suppression of echoes by 
fixed hybrids is insufficient to support full-duplex data transmission, 
and therefore makes adaptive data echo cancelers necessary. 
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In two-way data communications, transmitted data "echoes" back 
to the near-end receiver after being reflected and dispersed through 
an unknown return path. So, if one assumes that the echo path is 
linear, the estimation of its overall impulse response is sufficient to 
allow the synthesis of one's own echo signal. This synthesized version 
is subtracted from the received signal, which then makes it possible 
for the receiver to extract the data intended for it. 

The impulse response of the echo channel (local transmitter output 
to local receiver input) can be measured in several ways. An obvious 
way to do this is to transmit a single impulse and measure the echo. 
However, among other defects of this procedure, the average power 
would be very low and the resulting signal-to-noise ratio (s/n) would 
be inadequate. If a pseudorandom sequence (+1 or —1) is transmitted 
instead, the average power would be much greater and would be 
essentially constant on the line, more nearly representing a true data 
signal. The latter is the preferable approach. 

Digital data echo cancelers operate in two modes. In the acquisition 
mode, or start-up, the impulse response of the echo path is measured. 
This is best accomplished, as we shall see, with the use of fixed data 
sequences. Since, during this period, no information is conveyed to 
the far-end, the time allotted for this purpose should be as short as 
possible. Although it is conceptually possible to start an echo canceler 
either blind or with random data, the convergence of the taps, or the 
reliable measurements of the impulse response, are known to require 
a long time. In the subsequent mode, or during actual data transmis- 
sion, a tracking algorithm is initiated whose function is to update the 
measurements when slight changes occur in the impulse response. We 
focus on the more critical start-up algorithm. 

From an operational point of view, it is desirable to implement these 
algorithms in a recursive fashion, or in a closed-loop manner. This 
means that the canceler tap coefficients, which represent the sampled 
impulse response, are updated in response to a measured error between 
the actual impulse response and the one estimated at any particular 
instant. Conventional gradient adjustment algorithms, even with fixed 
data sequences, are known to converge very slowly. 

This research was motivated by the need for a theory capable of 
explaining the behavior of tap-adjustment algorithms. During the 
course of this investigation a modified stochastic gradient algorithm 
that is simple to implement and converges in the theoretically smallest 
number of steps was discovered. 

II. PROBLEM FORMULATION 

Figure 1 shows a full-duplex data modem employing a digital echo 
canceler. We are concerned with the sampled signal values 
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where the a n 's are the transmitted data symbols that echo back to the 
receiver, the h„'s are the overall impulse response values of the echo 
path, and the i»„'s are the desired received signal values plus noise. 
The object of the canceler is to synthesize signal values, c n , which are 
estimates of £* h k a n - k and subtract them from the r„'s. The receiver 
then proceeds to process the difference signal values, r n — c n , to extract 
the data intended for it. 

The fundamental problem is to devise procedures for estimating the 
/i„'s from observations of the r n 's, while treating the i»„'s as undesirable 
noise. Practically, it must be assumed that only a finite number, N, of 
the hnS can be estimated, and so we express (1) compactly as 

r n = H'A n + v n , (2) 

where H and A n are finite dimensional column vectors,* 

(h N \ /a„_ N \ 

: I, and A n = \\ 1, 
hi I \a„_i / 

and where ( )' indicates the transpose of a matrix. 

In the absence of precise statistical knowledge of the v n 's and H, a 
natural procedure for choosing an estimator of H is to minimize the 
sum of squared errors from time 1 to time / 

6/ = S (r„ - H'An) 2 . 

71=1 

This is a standard problem and the solution is immediate. It involves 
the solution of a set of linear equations 

ZiHi = U, (3) 



* We deal with a baseband model for notational convenience. By using complex 
numbers throughout, the treatment generalizes to passband models. 
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for the best estimator at time /, Hi. 

i 
In (3) Ui = S A n r n 

n=l 
/ 

and Z t = £ A n An. 

In applications it is usually desirable to solve these equations recur- 
sively with a minimum computational effort and to assure rapid 
convergence to the actual H. Our attention in the next sections is 
directed toward these aims. However, before dealing with our main 
subject we wish first to examine some asymptotic behaviors of the 
standard solution. 

2. 1 Random input data 

In some applications, echo cancelers must start blind, i.e., from 
random data. This is the case in echo cancelers used to suppress 
speech. However, in full-duplex data communications, a preamble 
word, or words, can be sent first to assure rapid convergence. To 
emphasize these differences we examine the behavior of the estimator 
with random data first, where one has no choice in the selection of 
the starting sequences. So, consider random data such that the a n 's 
assume ±1 independently with equal probability and examine the limit 
as / — > oo. It is found that 

U, -* lE\A n r n ) 

= lElA n A' n \H, 

Zi-*lE[A n A t n \ = lI, 



and, consequently, 



ZT l Ui-+H. (4) 



In the above we made use of the fact that the i/„'s and A„'s are naturally 
independent. We assumed that these sequences are ergodic and so 
replaced time averages with mathematical expectations, E[-}. This 
then demonstrates that if one is willing to wait forever, it is concep- 
tually possible to determine H exactly — not a terribly startling result. 
While the asymptotic behavior is easy to deduce, the statistical 
behavior of the estimator for finite / is difficult to glean. One imme- 
diately encounters an unsolved mathematical problem that involves 
the conditions on the random sequences that would guarantee the 
existence of the inverse matrix Zj x . Clearly, / has to be greater than 
N for the inverse to even have a chance to exist, and for those 
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Likewise, the vector 

Ui - 2 A n r B = t/w + A,r t . (13) 

n=l 

Now, applying the matrix inversion lemma 2 to (12), and assuming that 
the inverses exist, we find that 

7-1 _ 7-1 _ ZT-iAiAjZT-i (u) 

Zl - Z, ~ l l+AlZTMi ( ] 

This is the key equation and, in conjunction with (13), makes it 
possible to claim (11). We note that the algorithm expressed in (13) is 
computationally complicated since it requires the calculation of a 
matrix recursion, (14), and multiplication of matrices by vectors at 
each iteration. For large N this becomes infeasible and simpler pro- 
cedures are therefore sought. However, before proposing a simpler 
algorithm we need to review some properties of pseudorandom se- 
quences. 

The pseudorandom data sequences that are the inputs to the can- 
celer during the start-up period derive from the binary sequences 

X = X1X2X3JC4 • ■ ■ 
(Xj = 0, or 1). The nth digit is computed from certain of the earlier 
digits by means of the recurrence relation 

x n = x n -c + x n - b mod 2, 
where c and b are integers, < c < b. The actual data sequences \a n \ 
that are applied to the canceler are the x„'s with "0" replaced by "-1". 

Returning now to the sequence X, we remark that in spite of the 
fact that x n is completely determined by the digits that precede it, the 
sequence X resembles in some respects a completely random sequence. 
The calculation of the sequence X is carried out in a shift register 
working in a closed loop and a mod 2 adder. It turned out that 
for special choices of c and b the sequence X is periodic with period 

2*-l. 2 

In our application, we will make use of the following known prop- 
erties of the sequences 

1 \ A * A -V = [N,n= m 

1) A n A m - 2* On-Vm-i |_ 1> n j m 

2) Q l A n =1, for n = 1, • • • , N 



Q = 
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the all "1" vector 

3) A n+N = A n , periodicity. 

Data sequences possessing properties 1 through 3 are also referred to 
as pseudorandom sequences. As a consequence of these properties, it 
is easy to verify the following: 
Property (a): 

Z N = I A n A' n 

n=l 

= (N + 1)1 - QQ', Nx N matrix, 

where / is the identity matrix. This decomposition is possible since 
the y th element of the matrix Zn, (Zn)ij = N for i = j and —1 on the 
off-diagonals. This can readily be seen from property 1. 
Property (b): The inverse matrix 

(/ + QQ% 



N + 1 



This can be derived from the matrix inversion lemma or verified by 
actually computing Z n Z~n = I. 
Property (c): The set of vectors 

B n = Z~ N l A n , n - 1 • • • N 

are orthonormal to the vectors A n , n = 1, ■ • • N. This is a crucial 
property to what follows, so we prove that 

AUZjAn = -^- (/ + QQ')A n 

N + 1 

m A' m A n + (A'nQHQ'An) = 0, n # m 
N + 1 = 1, n = m. 

These properties suggest an approach to a simple and rapidly converg- 
ing tap-adjustment algorithm. 

The key to the simplification of the algorithm, (11), is the recogni- 
tion that ZT l for / < N does not exist, and so it is not possible to start 
the algorithm at / = 1. The basic idea is to replace Z t by Z N and thus 
obtain the simpler algorithm 

H n +i = H n — Z~n A n e n , (15) 

where the error at time n is again 

e n — A n ii n r„ 

= A' n (H n -H)- p n . 
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Note that this is a measurable quantity at each iteration and A„e n is 
just the gradient of the instantaneous squared error. 

The algorithm expressed in (15) is remarkably simple since 

Z * A » = jvVi (/ + QQt)An 

' (A n + Q), 



N+ 1 



which follows from the definition of Z N and property (b). Inserting 
this into (15) we obtain 

H n+X = H n - j^ (A n + Q)e n . (16) 

This form is immediately recognized as a slightly modified stochastic 
gradient algorithm with step size equal to 1/N + 1 and the gradient 
vector, A n , replaced by A n + Q. It is nothing more than the original 
vector, A n , in which a n 's equal to -1 are replaced by zero and a„ = 1 
is replaced by a n = 2. We wish to acknowledge that during the course 
of the development of this theory C. W. Farrow anticipated the form 
of this algorithm. 

It now remains to demonstrate that (16) indeed converges "fast", 
by which we mean that it converges in N steps. Toward this end, 
define the error vector 

e n = tin — H, 
and rewrite (16) in the form, 

6 B+ 1 = £ n ~~ Z~N l An(Antn ~ v n) 

= (I- BnKVn + B n l> n . d?) 

Iterating (17) yields explicitly 

e„ + , = ft (/ - B*Ai)c 



k=i 



n B— 1 



+ 2 II (/ " Bj+tAMBkVk. (18) 

fe=i j=k 

This is the general solution but because of property (c), which states 
that A'nBn-i = 0, we get a much simpler solution, which is the chief 
reason for the rapid convergence, namely, 
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*n+l 
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~ 1 

k=l 
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[■ 


- Zn 


(1 


A k AU 



+ ZJIZ A k v k ). 



k=\ 



(19) 



This simple form results because the product of the matrices in (18) 
reduces to 

f[ (/ - B*Ai)€! 

= (I - B l A[)(I - B 2 A< 2 ) ... (I- BnA'n)*! 
= [(/ - BMW - B 2 M) BnAfoi 



I - 1 B k Ai 



fi. 



The evolution of the error vector e n is guided by two components, 
the transient 



/ - Zj, 1 2 A k M 



k=\ 



Cl. 



and the steady-state component 



s n = zi, l [ s A k v k ). 



(20) 



(21) 



A most crucial property of the transient solution is that at n = N, t„ 
vanishes, and this is the reason for claiming "fast" convergence. 
Clearly, the transient solution cannot vanish before this time since 
the inverse doesn't even exist, and therefore we claim the algorithm 
convergence in the least possible number of steps. This most important 
property of the algorithm can be seen from (20), since 

t n = (I — Z~sZ N )ti = 0. 

Consequently, the error vector at time N + 1 consist only of measure- 
ment noise, or the steady-state component 
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/ N 

€ N+i =z N l E Ami (22) 

We have thus demonstrated that the algorithm converges to the 
true solution in N steps since the transient vanishes at the end of the 
Nth iteration and, from that time on, the taps fluctuate around the 
true value due to measurement noise, v n , alone. The variance of the 
tap fluctuations can be calculated from the variance matrix 

Pn = E\e N+ ieN+i} 

= Z~ N l ( I AkM-ElpwU ZS 1 

= MX = jf^ (I + QQ'), (23) 

where again we assumed that the v k 's are identically and independently 
distributed. The error variance, a 2 H , is therefore, 



<r% = a 2 Trace Z^ 1 



This is precisely the value we obtained from solving the set of linear 
equations, (3), with pseudorandom input sequences. Thus, iterating N 
times provides the solution in a simple fashion. 

It may turn out in applications that the value of variance obtained 
in N iterations is not sufficiently small. As seems reasonable, the noise 
variance can be reduced to any desired value by repeating the pseu- 
dorandom sequence of length N. To see this, consider a slightly 
modified tap-adjustment algorithm 

H n+l = H n - aZ^Atfint (25) 

where now the scaler, a, is a fixed step size yet to be determined. 
Proceeding as before, the recursion for the tap error e n = H n — H now 
becomes 

e n+1 = (/ - aB n A n )e n + aB n v n , (26) 

with the concomitant solution 
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«n+l 



= /-oZj?(£ am} 



k=l 



+ aZj f 1 [lA^ k ). (27) 



Again, examine the solution at n = pN + 1, where p is a positive 
integer, to obtain 



e P N+i = / - ocZn 1 ( £ A k A- k 



d 



(pN 

+ aZj, 1 ig A* k ). C2H» 

Since A„ is periodic with period N (property 3), we conclude that 

epN + i = (1 " «p)«i + aZj^ ( 2 A**J, (29) 

and so we see that the transient component can be made to vanish 
when a = 1/p. A straightforward calculation indicates that with this 
choice of a the variance is 

°»=wh^ (30 > 

indicating a reduction by a factor of p — the number of times the 
pseudorandom sequence is repeated. 
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